Add handling for carriage returns and backspaces #43

aftoul · 2025-05-30T00:01:35Z

Handle cell outputs containing \r and \b. This PR fixes the issues such as #39, encountered when using TQDM and other similar packages that use special characters to format their output.

github-actions · 2025-05-30T00:01:47Z

👈 Launch a Binder on branch aftoul/jupyter-server-nbmodel/main

krassowski · 2025-05-30T07:21:54Z

Thank you for opening this! I will test locally, but I think we should also add tests mirroring the tests in JupyterLab. Let me know if you need help with that.

Hi @echarles @eleonorecharles can you approve the CI? It looks like my rights were downgraded down to "triage" rather than to "maintain" so I can no longer help with that.

echarles · 2025-05-30T07:31:14Z

@krassowski Just bumped back to maintain. I think that was changed as part of our ISO/SOC certifications initiatives, sorry for that.

aftoul · 2025-05-30T11:47:07Z

@krassowski the tests would be really helpful, can you point me to some docs or the right files to add them? I'd love to implement them.

krassowski · 2025-05-30T12:02:21Z

It think we could create a new test_actions.py in jupyter_server_nbmodel/tests for unit test. One end-to-end test (with a proper execution of code) could also be added in existing jupyter_server_nbmodel/tests/test_handlers.py

aftoul · 2025-05-30T12:06:18Z

Thank you, I'll get on that in the coming few days once I'm free.

echarles · 2025-06-25T06:06:59Z

Thank you, I'll get on that in the coming few days once I'm free.

@aftoul Apart from the test, is there anything else remaining before merging?

cben · 2025-06-25T08:37:10Z

I'm testing this now in a notebook with some logging added...

--- jupyter_server_nbmodel/actions.py
+++ jupyter_server_nbmodel/actions.py
@@ -162,28 +162,32 @@ def _output_hook(outputs: list[NotebookNode], ycell: y.Map | None, msg: dict) ->
 
                     if text.endswith((os.linesep, "\n")):
                         text = text[:-1]
 
                     if (not cell_outputs) or (cell_outputs[-1]["name"] != output["name"]):
                         output["text"] = [handle_carriage_return(handle_backspace(text))]
+                        get_logger().info("??? text=%r | backspace => %r | cr => %r", text, handle_backspace(text), output["text"])
                         cell_outputs.append(output)
                     else:
                         last_output = cell_outputs[-1]
                         old_text = last_output["text"][-1] if len(last_output["text"]) > 0 else ""
                         combined_text = old_text + text
                         if '\r' in combined_text or '\b' in combined_text:
                             if combined_text[-1] == '\r':
                                 suffix = '\r'
                                 combined_text = combined_text[:-1]
                             else:
                                 suffix = ''
                             new_text = handle_carriage_return(handle_backspace(combined_text)) + suffix
+                            get_logger().info("??? (combined_text=%r | backspace => %r | cr => %r) + suffix %r", 
+                                              combined_text, handle_backspace(combined_text), handle_carriage_return(handle_backspace(combined_text)), suffix)
                             last_output["text"][-1] = new_text
                         else:
                             last_output["text"].append(text)
                         cell_outputs[-1] = last_output
+                    get_logger().warning("@@@ text=%r, cell_outputs=%s", output["text"], cell_outputs)

I'm seeing it replace prior chars when printed in one chunk (or buffered together) e.g.:
```
print('1110\b11\r2 34')
```
with handle_backspace -> handle_carriage_return doing their job:
```
[I ...] ??? text='1110\x0811\r2 34' | backspace => '11111\r2 34' | cr => ['2 341']
```

However, when streamed separately:

import time
print('1110\b', end='')
time.sleep(1)
print('11', end='')
time.sleep(1)
print('\r2 ', end='')
time.sleep(1)
print('3', end='')
time.sleep(1)
print('4')

they are only partly combined, the final output is:

111
2
3
4

Log:

[I ...] ??? text='1110\x08' | backspace => '111' | cr => ['111']
[W ...] @@@ text=['111'], cell_outputs=[{"output_type":"stream","name":"stdout","text":["111"]}]
[W ...] @@@ text='11', cell_outputs=[{"output_type":"stream","text":["111","11"],"name":"stdout"}]
[I ...] ??? (combined_text='11\r2 ' | backspace => '11\r2 ' | cr => '2 ') + suffix ''
[W ...] @@@ text='\r2 ', cell_outputs=[{"text":["111","2 "],"output_type":"stream","name":"stdout"}]
[W ...] @@@ text='3', cell_outputs=[{"text":["111","2 ","3"],"output_type":"stream","name":"stdout"}]
[W ...] @@@ text='4\n', cell_outputs=[{"text":["111","2 ","3","4"],"name":"stdout","output_type":"stream"}]

I haven't dug deeper yet but looks like handle_backspace -> handle_carriage_return code path is not always invoked?

I feel there may be a second issue here — you're now handling \b \r, but there was also an issue of starting new lines when the sent data didn't contain \n.
Perhaps the last else: last_output["text"].append(text) branch is responsible?

cben · 2025-06-25T09:41:51Z

jupyter_server_nbmodel/actions.py

+                        old_text = last_output["text"][-1] if len(last_output["text"]) > 0 else ""
+                        combined_text = old_text + text
+                        if '\r' in combined_text or '\b' in combined_text:
+                            if combined_text[-1] == '\r':


It's strange to me that this needs to be special-cased here. I hope it can be folded into the handle_...() functions.

Why do I say that? A terminal has a state machine processing incoming characters, but never cares whether they arrived in one buffer vs. char-by-char, and a char's handling is independent of being first/middle/last in some buffer.
[in practice, there may be some timeouts for incomplete escape sequences? I know on input emacs distinguishes ESC press from prefix by timeout, but don't think terminals do this? Anyway \r and \b are single-char "atomic" sequences, so that's irrelevant.]

It's possible that modeling the state as old_text fed back into the functions same as arriving characters is insufficient. 🤔

block-beta columns 6 space space state[("old_text")] space space kernel space space space space space space handle_cr space handle_b space space space kernel-- "+ incoming text" -->state state --> handle_b handle_b --> handle_cr handle_cr --> state

Loading

More formally, these functions could take 2 args (state, incoming text) and reduce it to updated state? Not sure whether that matters here.

For starters, cursor position is now modeled by cutting off/replacing some trailing characters but perhaps is more natural to model state as (text, cursor) pair?

I added the special case for when incoming text contains \r at the end to ensure the text is shown. Otherwise, the text may be discarded too early and never be shown. So yeah, I guess that is a weakness of the old_text/new_text approach.

The other issue that you pointed out is due to appending incoming text as new lines when neither the old text nor the new one contain a special character. I don't remember if that was part of the original behavior, but what would be the ideal way to handle that here? I'm thinking of concatenating the new output to the end of the last cell until \n is encountered or doing away with the lists and replacing them with a (text, cursor) pair as you suggested.

Otherwise, the text may be discarded too early and never be shown.

Ah yes that's where it gets interesting 🤔

In an ideal world, now that we've had glass TTYs for decades, there'd be some migration path to simpler semantics for apps, where \b would actually eat chars and \r would clear out everything since the last \n.

But what \r does in actual terminals, as well as existing Jupyter client-side output processing, is keep text in place, just move cursor to start of line. E.g. '11111\r 2' results in '2 111' (with cursor at the first 1) — if new text is shorter, remains of previous text are still visible.
This is rarely used; it's more of a nuisance that after \r apps may need to right-pad the next line with spaces to ensure previous gets over-written. But it does give flexibility, as you say, that printing '...\r' each time is near-equivalent to printing '\r...' each time (only difference is where cursor stays most of the time).
Anyway, I don't suppose a plugin shifting execution to server is the place to deviate from that behavior.

Moreover, actual terminals [soft-]wrap long lines and \r only returns to start of current line; at least Jupyter avoids that complication (behaves closer to infinite-width terminal, with horizontal scrolling if necessary).

But your suffix technique is interesting, I only now fully grokked it.
The Y model has no concept of representing or dislaying a "cursor", right?
By sometimes keeping an \r unprocessed — and storing it as such in the Y doc¹ too — it can represent states where a line is not fully overwritten yet 👍.
(¹This falls back on client-side to hide text per \r \b. Funny but OK imho.)

So perhaps even the handle_*() functions themselves could adopt this approach to represent all partially-overwritten states? E.g. now, if \r is in middle and doesn't hit your special code path:

print('11111\r2 ', end='') time.sleep(1) print('b\b3', end='') # \b simply to avoid `last_output["text"].append(text)` branch

results in 2 1113, note how 3 got wrongly tacked at the end even though logical "cursor" should have stayed after the 2 . If the intermediate state was kept in 11111\r2 form, the "cursor position" could be retained.

With my previous suggestion of (text, cursor) pair, the cursor position can't be persisted into Y doc, only kept in-memory in server.
However, it's not like we have any way to continue an in-progress execution when server dies, so I guess that's fine too.

cben · 2025-06-25T10:30:32Z

@aftoul Did you make any progress on tests, or do you want me to write some?

I feel it's not enough to unit-test handle_backspace() & handle_carriage_return(), we also better test some of the logic now living in _output_hook().
My first instinct was to refactor some of that logic into a function 🤔. But looking at test_handlers.py, it seems it already has the scaffolding to test the whole thing including Y model updates.

aftoul · 2025-06-25T10:34:08Z

@cben I started working on the tests but life happened. I'll go back to them now that I have a bit less work

cben · 2025-07-03T12:04:35Z

jupyter_server_nbmodel/actions.py

                    if text.endswith((os.linesep, "\n")):
                        text = text[:-1]


BTW, this was an approximation, it loses data. Regular foo\n should not be treated same as foo.

And it'd be nice to tie the array structure of "text": [...] to newlines?

In regular Jupyterlab, here is what I see:

and here is the .ipynb representation:

> jq .cells[].outputs --compact-output Untitled.ipynb [{"name":"stdout","output_type":"stream","text":["no trailing"]}] [{"name":"stdout","output_type":"stream","text":["1 trailing\n"]}] [{"name":"stdout","output_type":"stream","text":["2 trailing\n","\n"]}] [{"name":"stdout","output_type":"stream","text":["foobar\n","baz"]}] []

=> in doc model, "text" array tends to start new item after every newline. 📜

https://nbformat.readthedocs.io/en/latest/format_description.html says it doesn't matter!

Some fields, such as code input and text output, are characteristically multi-line strings. When these fields are written to disk, they may be written as a list of strings, which should be joined with '' when reading back into memory. In programmatic APIs for working with notebooks (Python, Javascript), these are always re-joined into the original multi-line string. ...

However, keeping item-per-line is nice for .ipynb diff-ability.

=> in doc model, the last line does record whether it had a trailing newline. ✍️

=> the UI does not show whether the the last line had trailing newline 🙈

In a terminal, lack of trailing newline is evident by next prompt just appearing on same line e.g. no trailing>>> but that'd be a sin against the structuredness of notebooks. And since 1 newline is the common case, UI didn't want to spend vertical space rendering it. (the first two cells have exactly same height)

UI does reflect >1 newlines at the end by showing blank lines.

But in theory, future/alternate UI might reflect trailing newline.

For comparison, here is the structure for same cells I'm getting with this branch:

[{"name":"stdout","output_type":"stream","text":["no trailing"]}] [{"name":"stdout","output_type":"stream","text":["1 trailing"]}] [{"name":"stdout","output_type":"stream","text":["2 trailing\n"]}] [{"name":"stdout","output_type":"stream","text":["foo","bar\nbaz"]}]

eats one newline off the 1 & 2 cases 🐞

the array structure — while formally not important — reflects how it got streamed, less stable for diffs.

❓ However, it appears the current Jupyterlab is not as lenient as the spec I cited, and does NOT concat multiple items without newlines into one line?

for i in range(30): print(i, end=' ') import time; time.sleep(1)

upstream results in json:

[{"name":"stdout","output_type":"stream","text":["0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 "]}]

which renders all on one line. With this extension (with or without this branch), I get:

[{"name":"stdout","output_type":"stream","text":["0 ","1 ","2 ","3 ","4 ","5 ","6 ","7 ","8 ","9 ","10 ","11 ","12 ","13 ","14 ","15 ","16 ","17 ","18 ","19 ","20 ","21 ","22 ","23 ","24 ","25 ","26 ","27 ","28 ","29 "]}]

which is supposed to be equivalent yet shows up in browser as each number on a separate line!

krassowski · 2025-07-19T08:57:20Z

Highlighting #48 as of double relevance here:

it introduced a tiny merge conflict which should be easy to resolve (cell_outputs[-1]["name"] became cell_outputs[-1].get("name", None))
it added a utility rtc_test_notebook fixture which makes writing tests for testing output behaviour a bit easier

Add handling for carriage returns and backspaces

29d4096

krassowski added the bug Something isn't working label May 30, 2025

krassowski linked an issue May 30, 2025 that may be closed by this pull request

Standard tqdm displays on multiple lines when jupyter_server_nbmodel is installed #39

Open

cben reviewed Jun 25, 2025

View reviewed changes

cben reviewed Jul 3, 2025

View reviewed changes

krassowski mentioned this pull request Jul 18, 2025

Crash when executing a cell with an output which does not have a name #47

Closed

Uh oh!

Add handling for carriage returns and backspaces #43

Are you sure you want to change the base?

Add handling for carriage returns and backspaces #43

Uh oh!

Conversation

aftoul commented May 30, 2025

Uh oh!

github-actions bot commented May 30, 2025

Uh oh!

krassowski commented May 30, 2025

Uh oh!

echarles commented May 30, 2025

Uh oh!

aftoul commented May 30, 2025

Uh oh!

krassowski commented May 30, 2025

Uh oh!

aftoul commented May 30, 2025

Uh oh!

echarles commented Jun 25, 2025

Uh oh!

cben commented Jun 25, 2025

Uh oh!

cben Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

aftoul Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

cben Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

cben commented Jun 25, 2025

Uh oh!

aftoul commented Jun 25, 2025

Uh oh!

cben Jul 3, 2025

Choose a reason for hiding this comment

=> in doc model, "text" array tends to start new item after every newline. 📜

=> in doc model, the last line does record whether it had a trailing newline. ✍️

=> the UI does not show whether the the last line had trailing newline 🙈

Uh oh!

cben Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

krassowski commented Jul 19, 2025

Uh oh!

Uh oh!

cben Jun 25, 2025 •

edited

Loading